Overview

Dataset statistics

Number of variables32
Number of observations226537
Missing cells1096719
Missing cells (%)15.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory24.9 MiB
Average record size in memory115.4 B

Variable types

Numeric8
DateTime5
Categorical19

Alerts

ft_tu has a high cardinality: 55 distinct values High cardinality
ft_vm has a high cardinality: 18232 distinct values High cardinality
fg_startort has a high cardinality: 12165 distinct values High cardinality
fg_zielort has a high cardinality: 11995 distinct values High cardinality
ft_startort has a high cardinality: 6252 distinct values High cardinality
ft_zielort has a high cardinality: 5341 distinct values High cardinality
df_index is highly correlated with participant_idHigh correlation
participant_id is highly correlated with df_indexHigh correlation
u_fahrausweis is highly correlated with S_AB3_HTAHigh correlation
S_AB3_HTA is highly correlated with u_fahrausweis and 1 other fieldsHigh correlation
u_ticket is highly correlated with u_gaHigh correlation
u_ga is highly correlated with S_AB3_HTA and 1 other fieldsHigh correlation
S_alter has 5703 (2.5%) missing values Missing
S_sex has 5406 (2.4%) missing values Missing
S_wohnsitz has 5405 (2.4%) missing values Missing
u_klassencode has 5936 (2.6%) missing values Missing
u_ga has 144328 (63.7%) missing values Missing
S_AB3_HTA has 12158 (5.4%) missing values Missing
R_anschluss has 101897 (45.0%) missing values Missing
R_stoerung has 47604 (21.0%) missing values Missing
device_type has 82209 (36.3%) missing values Missing
dispcode has 82209 (36.3%) missing values Missing
u_ticket has 40027 (17.7%) missing values Missing
u_fahrausweis has 110919 (49.0%) missing values Missing
u_preis has 21482 (9.5%) missing values Missing
R_zweck has 5355 (2.4%) missing values Missing
ft_abfahrt has 38301 (16.9%) missing values Missing
ft_ankunft has 38301 (16.9%) missing values Missing
ft_startort_uic has 38301 (16.9%) missing values Missing
ft_tu has 38301 (16.9%) missing values Missing
ft_vm has 38301 (16.9%) missing values Missing
ft_vm_kurz has 38301 (16.9%) missing values Missing
ft_zielort_uic has 38301 (16.9%) missing values Missing
fg_abfahrt has 27467 (12.1%) missing values Missing
fg_ankunft has 27467 (12.1%) missing values Missing
fg_startort_uic has 27467 (12.1%) missing values Missing
fg_zielort_uic has 27467 (12.1%) missing values Missing
fg_startort has 6590 (2.9%) missing values Missing
fg_zielort has 6587 (2.9%) missing values Missing
ft_startort has 17466 (7.7%) missing values Missing
ft_zielort has 17463 (7.7%) missing values Missing
ft_startort_uic is highly skewed (γ1 = -38.85345395) Skewed
fg_startort_uic is highly skewed (γ1 = -54.12944026) Skewed
fg_zielort_uic is highly skewed (γ1 = -47.23386188) Skewed
df_index has unique values Unique
participant_id has unique values Unique

Reproduction

Analysis started2022-11-18 15:24:33.063971
Analysis finished2022-11-18 15:34:21.657542
Duration9 minutes and 48.59 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct226537
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean113441.2574
Minimum0
Maximum229488
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:21.733176image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11499.8
Q156807
median113441
Q3170075
95-th percentile215382.2
Maximum229488
Range229488
Interquartile range (IQR)113268

Descriptive statistics

Standard deviation65397.92223
Coefficient of variation (CV)0.576491514
Kurtosis-1.199839849
Mean113441.2574
Median Absolute Deviation (MAD)56634
Skewness2.43676751 × 10-5
Sum2.569864213 × 1010
Variance4276888232
MonotonicityNot monotonic
2022-11-18T16:34:21.849868image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2263351
 
< 0.1%
758131
 
< 0.1%
758441
 
< 0.1%
758461
 
< 0.1%
758271
 
< 0.1%
758261
 
< 0.1%
758251
 
< 0.1%
758241
 
< 0.1%
758081
 
< 0.1%
758091
 
< 0.1%
Other values (226527)226527
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
2294881
< 0.1%
2293451
< 0.1%
2292871
< 0.1%
2288031
< 0.1%
2287381
< 0.1%
2287171
< 0.1%
2286871
< 0.1%
2284981
< 0.1%
2284201
< 0.1%
2284001
< 0.1%

participant_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct226537
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean371630.3177
Minimum642
Maximum587992
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:21.993819image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum642
5-th percentile66039.2
Q1338200
median425687
Q3503943
95-th percentile568290.2
Maximum587992
Range587350
Interquartile range (IQR)165743

Descriptive statistics

Standard deviation171154.2938
Coefficient of variation (CV)0.4605498681
Kurtosis-0.8145644217
Mean371630.3177
Median Absolute Deviation (MAD)82945
Skewness-0.7893698922
Sum8.418801728 × 1010
Variance2.929379229 × 1010
MonotonicityNot monotonic
2022-11-18T16:34:22.105877image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5836191
 
< 0.1%
3644621
 
< 0.1%
3645081
 
< 0.1%
3645121
 
< 0.1%
3644831
 
< 0.1%
3644821
 
< 0.1%
3644811
 
< 0.1%
3644801
 
< 0.1%
3644551
 
< 0.1%
3644561
 
< 0.1%
Other values (226527)226527
> 99.9%
ValueCountFrequency (%)
6421
< 0.1%
6571
< 0.1%
247561
< 0.1%
256201
< 0.1%
412151
< 0.1%
413051
< 0.1%
413341
< 0.1%
413761
< 0.1%
414231
< 0.1%
414591
< 0.1%
ValueCountFrequency (%)
5879921
< 0.1%
5878031
< 0.1%
5877331
< 0.1%
5870481
< 0.1%
5869571
< 0.1%
5869321
< 0.1%
5868901
< 0.1%
5866171
< 0.1%
5865061
< 0.1%
5864781
< 0.1%

u_date
Date

Distinct1314
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Memory size1.7 MiB
Minimum2019-01-02 00:00:00
Maximum2022-10-30 00:00:00
2022-11-18T16:34:22.227917image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:34:22.350450image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

S_alter
Real number (ℝ≥0)

MISSING

Distinct89
Distinct (%)< 0.1%
Missing5703
Missing (%)2.5%
Infinite0
Infinite (%)0.0%
Mean51.57685411
Minimum10
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:22.479716image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum10
5-th percentile23
Q140
median53
Q364
95-th percentile75
Maximum98
Range88
Interquartile range (IQR)24

Descriptive statistics

Standard deviation15.93574306
Coefficient of variation (CV)0.308970823
Kurtosis-0.6226844772
Mean51.57685411
Median Absolute Deviation (MAD)12
Skewness-0.3044130342
Sum11389923
Variance253.9479069
MonotonicityNot monotonic
2022-11-18T16:34:22.602095image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
556238
 
2.8%
606051
 
2.7%
505874
 
2.6%
565434
 
2.4%
585431
 
2.4%
575413
 
2.4%
525364
 
2.4%
655293
 
2.3%
545167
 
2.3%
535129
 
2.3%
Other values (79)165440
73.0%
(Missing)5703
 
2.5%
ValueCountFrequency (%)
1074
 
< 0.1%
1141
 
< 0.1%
1248
 
< 0.1%
1368
 
< 0.1%
14226
 
0.1%
15558
 
0.2%
161184
0.5%
171484
0.7%
181620
0.7%
191408
0.6%
ValueCountFrequency (%)
985
 
< 0.1%
972
 
< 0.1%
962
 
< 0.1%
958
 
< 0.1%
9410
 
< 0.1%
937
 
< 0.1%
928
 
< 0.1%
9126
< 0.1%
9063
< 0.1%
8954
< 0.1%

S_sex
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing5406
Missing (%)2.4%
Memory size221.5 KiB
weiblich
121924 
männlich
98553 
divers
 
654

Length

Max length8
Median length8
Mean length7.994084954
Min length6

Characters and Unicode

Total characters1767740
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowweiblich
2nd rowmännlich
3rd rowweiblich
4th rowweiblich
5th rowweiblich

Common Values

ValueCountFrequency (%)
weiblich121924
53.8%
männlich98553
43.5%
divers654
 
0.3%
(Missing)5406
 
2.4%

Length

2022-11-18T16:34:22.720026image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:23.987652image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
weiblich121924
55.1%
männlich98553
44.6%
divers654
 
0.3%

Most occurring characters

ValueCountFrequency (%)
i343055
19.4%
l220477
12.5%
c220477
12.5%
h220477
12.5%
n197106
11.2%
e122578
 
6.9%
w121924
 
6.9%
b121924
 
6.9%
m98553
 
5.6%
ä98553
 
5.6%
Other values (4)2616
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1767740
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i343055
19.4%
l220477
12.5%
c220477
12.5%
h220477
12.5%
n197106
11.2%
e122578
 
6.9%
w121924
 
6.9%
b121924
 
6.9%
m98553
 
5.6%
ä98553
 
5.6%
Other values (4)2616
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin1767740
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
i343055
19.4%
l220477
12.5%
c220477
12.5%
h220477
12.5%
n197106
11.2%
e122578
 
6.9%
w121924
 
6.9%
b121924
 
6.9%
m98553
 
5.6%
ä98553
 
5.6%
Other values (4)2616
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1669187
94.4%
None98553
 
5.6%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i343055
20.6%
l220477
13.2%
c220477
13.2%
h220477
13.2%
n197106
11.8%
e122578
 
7.3%
w121924
 
7.3%
b121924
 
7.3%
m98553
 
5.9%
d654
 
< 0.1%
Other values (3)1962
 
0.1%
None
ValueCountFrequency (%)
ä98553
100.0%

S_wohnsitz
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing5405
Missing (%)2.4%
Memory size221.5 KiB
In der Schweiz / Liechtenstein
215855 
In einem anderen Land
 
5277

Length

Max length30
Median length30
Mean length29.78522783
Min length21

Characters and Unicode

Total characters6586467
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIn der Schweiz / Liechtenstein
2nd rowIn der Schweiz / Liechtenstein
3rd rowIn der Schweiz / Liechtenstein
4th rowIn der Schweiz / Liechtenstein
5th rowIn der Schweiz / Liechtenstein

Common Values

ValueCountFrequency (%)
In der Schweiz / Liechtenstein215855
95.3%
In einem anderen Land5277
 
2.3%
(Missing)5405
 
2.4%

Length

2022-11-18T16:34:24.072417image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:24.161911image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
in221132
20.1%
der215855
19.6%
schweiz215855
19.6%
215855
19.6%
liechtenstein215855
19.6%
einem5277
 
0.5%
anderen5277
 
0.5%
land5277
 
0.5%

Most occurring characters

ValueCountFrequency (%)
e1100383
16.7%
879251
13.3%
n673950
10.2%
i652842
9.9%
c431710
 
6.6%
h431710
 
6.6%
t431710
 
6.6%
d226409
 
3.4%
I221132
 
3.4%
L221132
 
3.4%
Other values (8)1316238
20.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4833242
73.4%
Space Separator879251
 
13.3%
Uppercase Letter658119
 
10.0%
Other Punctuation215855
 
3.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1100383
22.8%
n673950
13.9%
i652842
13.5%
c431710
 
8.9%
h431710
 
8.9%
t431710
 
8.9%
d226409
 
4.7%
r221132
 
4.6%
s215855
 
4.5%
w215855
 
4.5%
Other values (3)231686
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
I221132
33.6%
L221132
33.6%
S215855
32.8%
Space Separator
ValueCountFrequency (%)
879251
100.0%
Other Punctuation
ValueCountFrequency (%)
/215855
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5491361
83.4%
Common1095106
 
16.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1100383
20.0%
n673950
12.3%
i652842
11.9%
c431710
 
7.9%
h431710
 
7.9%
t431710
 
7.9%
d226409
 
4.1%
I221132
 
4.0%
L221132
 
4.0%
r221132
 
4.0%
Other values (6)879251
16.0%
Common
ValueCountFrequency (%)
879251
80.3%
/215855
 
19.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII6586467
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1100383
16.7%
879251
13.3%
n673950
10.2%
i652842
9.9%
c431710
 
6.6%
h431710
 
6.6%
t431710
 
6.6%
d226409
 
3.4%
I221132
 
3.4%
L221132
 
3.4%
Other values (8)1316238
20.0%

u_klassencode
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing5936
Missing (%)2.6%
Memory size221.5 KiB
2. Klasse
192601 
1. Klasse
28000 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters1985409
Distinct characters9
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2. Klasse
2nd row2. Klasse
3rd row2. Klasse
4th row2. Klasse
5th row2. Klasse

Common Values

ValueCountFrequency (%)
2. Klasse192601
85.0%
1. Klasse28000
 
12.4%
(Missing)5936
 
2.6%

Length

2022-11-18T16:34:24.235984image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:24.325531image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
klasse220601
50.0%
2192601
43.7%
128000
 
6.3%

Most occurring characters

ValueCountFrequency (%)
s441202
22.2%
.220601
11.1%
220601
11.1%
K220601
11.1%
l220601
11.1%
a220601
11.1%
e220601
11.1%
2192601
9.7%
128000
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1103005
55.6%
Other Punctuation220601
 
11.1%
Space Separator220601
 
11.1%
Uppercase Letter220601
 
11.1%
Decimal Number220601
 
11.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s441202
40.0%
l220601
20.0%
a220601
20.0%
e220601
20.0%
Decimal Number
ValueCountFrequency (%)
2192601
87.3%
128000
 
12.7%
Other Punctuation
ValueCountFrequency (%)
.220601
100.0%
Space Separator
ValueCountFrequency (%)
220601
100.0%
Uppercase Letter
ValueCountFrequency (%)
K220601
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1323606
66.7%
Common661803
33.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
s441202
33.3%
K220601
16.7%
l220601
16.7%
a220601
16.7%
e220601
16.7%
Common
ValueCountFrequency (%)
.220601
33.3%
220601
33.3%
2192601
29.1%
128000
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII1985409
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s441202
22.2%
.220601
11.1%
220601
11.1%
K220601
11.1%
l220601
11.1%
a220601
11.1%
e220601
11.1%
2192601
9.7%
128000
 
1.4%

u_ga
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing144328
Missing (%)63.7%
Memory size221.5 KiB
kein GA
75456 
besitzt GA
 
6753

Length

Max length10
Median length7
Mean length7.246432872
Min length7

Characters and Unicode

Total characters595722
Distinct characters11
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowkein GA
2nd rowkein GA
3rd rowkein GA
4th rowkein GA
5th rowkein GA

Common Values

ValueCountFrequency (%)
kein GA75456
33.3%
besitzt GA6753
 
3.0%
(Missing)144328
63.7%

Length

2022-11-18T16:34:24.411712image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:24.510847image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ga82209
50.0%
kein75456
45.9%
besitzt6753
 
4.1%

Most occurring characters

ValueCountFrequency (%)
e82209
13.8%
i82209
13.8%
82209
13.8%
G82209
13.8%
A82209
13.8%
k75456
12.7%
n75456
12.7%
t13506
 
2.3%
b6753
 
1.1%
s6753
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter349095
58.6%
Uppercase Letter164418
27.6%
Space Separator82209
 
13.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e82209
23.5%
i82209
23.5%
k75456
21.6%
n75456
21.6%
t13506
 
3.9%
b6753
 
1.9%
s6753
 
1.9%
z6753
 
1.9%
Uppercase Letter
ValueCountFrequency (%)
G82209
50.0%
A82209
50.0%
Space Separator
ValueCountFrequency (%)
82209
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin513513
86.2%
Common82209
 
13.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e82209
16.0%
i82209
16.0%
G82209
16.0%
A82209
16.0%
k75456
14.7%
n75456
14.7%
t13506
 
2.6%
b6753
 
1.3%
s6753
 
1.3%
z6753
 
1.3%
Common
ValueCountFrequency (%)
82209
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII595722
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e82209
13.8%
i82209
13.8%
82209
13.8%
G82209
13.8%
A82209
13.8%
k75456
12.7%
n75456
12.7%
t13506
 
2.3%
b6753
 
1.1%
s6753
 
1.1%

S_AB3_HTA
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing12158
Missing (%)5.4%
Memory size221.5 KiB
ja
177292 
nein
37087 

Length

Max length4
Median length2
Mean length2.34599471
Min length2

Characters and Unicode

Total characters502932
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowja
2nd rowja
3rd rownein
4th rownein
5th rowja

Common Values

ValueCountFrequency (%)
ja177292
78.3%
nein37087
 
16.4%
(Missing)12158
 
5.4%

Length

2022-11-18T16:34:24.599613image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:24.703106image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ja177292
82.7%
nein37087
 
17.3%

Most occurring characters

ValueCountFrequency (%)
j177292
35.3%
a177292
35.3%
n74174
14.7%
e37087
 
7.4%
i37087
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter502932
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
j177292
35.3%
a177292
35.3%
n74174
14.7%
e37087
 
7.4%
i37087
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin502932
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
j177292
35.3%
a177292
35.3%
n74174
14.7%
e37087
 
7.4%
i37087
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII502932
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
j177292
35.3%
a177292
35.3%
n74174
14.7%
e37087
 
7.4%
i37087
 
7.4%

R_anschluss
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing101897
Missing (%)45.0%
Memory size221.5 KiB
Ja
118789 
Nein
 
5851

Length

Max length4
Median length2
Mean length2.093886393
Min length2

Characters and Unicode

Total characters260982
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJa
2nd rowJa
3rd rowJa
4th rowJa
5th rowJa

Common Values

ValueCountFrequency (%)
Ja118789
52.4%
Nein5851
 
2.6%
(Missing)101897
45.0%

Length

2022-11-18T16:34:24.794442image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:24.897007image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ja118789
95.3%
nein5851
 
4.7%

Most occurring characters

ValueCountFrequency (%)
J118789
45.5%
a118789
45.5%
N5851
 
2.2%
e5851
 
2.2%
i5851
 
2.2%
n5851
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter136342
52.2%
Uppercase Letter124640
47.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a118789
87.1%
e5851
 
4.3%
i5851
 
4.3%
n5851
 
4.3%
Uppercase Letter
ValueCountFrequency (%)
J118789
95.3%
N5851
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Latin260982
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
J118789
45.5%
a118789
45.5%
N5851
 
2.2%
e5851
 
2.2%
i5851
 
2.2%
n5851
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII260982
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
J118789
45.5%
a118789
45.5%
N5851
 
2.2%
e5851
 
2.2%
i5851
 
2.2%
n5851
 
2.2%

R_stoerung
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing47604
Missing (%)21.0%
Memory size221.5 KiB
Nein
165161 
Ja
 
13772

Length

Max length4
Median length4
Mean length3.846065287
Min length2

Characters and Unicode

Total characters688188
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNein
2nd rowNein
3rd rowNein
4th rowNein
5th rowNein

Common Values

ValueCountFrequency (%)
Nein165161
72.9%
Ja13772
 
6.1%
(Missing)47604
 
21.0%

Length

2022-11-18T16:34:24.979417image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:25.084724image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
nein165161
92.3%
ja13772
 
7.7%

Most occurring characters

ValueCountFrequency (%)
N165161
24.0%
e165161
24.0%
i165161
24.0%
n165161
24.0%
J13772
 
2.0%
a13772
 
2.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter509255
74.0%
Uppercase Letter178933
 
26.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e165161
32.4%
i165161
32.4%
n165161
32.4%
a13772
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
N165161
92.3%
J13772
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
Latin688188
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N165161
24.0%
e165161
24.0%
i165161
24.0%
n165161
24.0%
J13772
 
2.0%
a13772
 
2.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII688188
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N165161
24.0%
e165161
24.0%
i165161
24.0%
n165161
24.0%
J13772
 
2.0%
a13772
 
2.0%

device_type
Categorical

MISSING

Distinct2
Distinct (%)< 0.1%
Missing82209
Missing (%)36.3%
Memory size221.5 KiB
Desktop
96067 
Smartphone
48261 

Length

Max length10
Median length7
Mean length8.003152541
Min length7

Characters and Unicode

Total characters1155079
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDesktop
2nd rowDesktop
3rd rowDesktop
4th rowDesktop
5th rowDesktop

Common Values

ValueCountFrequency (%)
Desktop96067
42.4%
Smartphone48261
21.3%
(Missing)82209
36.3%

Length

2022-11-18T16:34:25.166011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:25.265856image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
desktop96067
66.6%
smartphone48261
33.4%

Most occurring characters

ValueCountFrequency (%)
e144328
12.5%
t144328
12.5%
o144328
12.5%
p144328
12.5%
D96067
8.3%
s96067
8.3%
k96067
8.3%
S48261
 
4.2%
m48261
 
4.2%
a48261
 
4.2%
Other values (3)144783
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1010751
87.5%
Uppercase Letter144328
 
12.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e144328
14.3%
t144328
14.3%
o144328
14.3%
p144328
14.3%
s96067
9.5%
k96067
9.5%
m48261
 
4.8%
a48261
 
4.8%
r48261
 
4.8%
h48261
 
4.8%
Uppercase Letter
ValueCountFrequency (%)
D96067
66.6%
S48261
33.4%

Most occurring scripts

ValueCountFrequency (%)
Latin1155079
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e144328
12.5%
t144328
12.5%
o144328
12.5%
p144328
12.5%
D96067
8.3%
s96067
8.3%
k96067
8.3%
S48261
 
4.2%
m48261
 
4.2%
a48261
 
4.2%
Other values (3)144783
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1155079
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e144328
12.5%
t144328
12.5%
o144328
12.5%
p144328
12.5%
D96067
8.3%
s96067
8.3%
k96067
8.3%
S48261
 
4.2%
m48261
 
4.2%
a48261
 
4.2%
Other values (3)144783
12.5%

dispcode
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing82209
Missing (%)36.3%
Memory size221.5 KiB
Beendet
110786 
Ausgescreent
30645 
Beendet nach Unterbrechung
 
2897

Length

Max length26
Median length7
Mean length8.44301868
Min length7

Characters and Unicode

Total characters1218564
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBeendet
2nd rowBeendet
3rd rowAusgescreent
4th rowBeendet
5th rowBeendet nach Unterbrechung

Common Values

ValueCountFrequency (%)
Beendet110786
48.9%
Ausgescreent30645
 
13.5%
Beendet nach Unterbrechung2897
 
1.3%
(Missing)82209
36.3%

Length

2022-11-18T16:34:25.359696image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:25.465352image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
beendet113683
75.7%
ausgescreent30645
 
20.4%
nach2897
 
1.9%
unterbrechung2897
 
1.9%

Most occurring characters

ValueCountFrequency (%)
e438778
36.0%
n153019
 
12.6%
t147225
 
12.1%
B113683
 
9.3%
d113683
 
9.3%
s61290
 
5.0%
c36439
 
3.0%
r36439
 
3.0%
u33542
 
2.8%
g33542
 
2.8%
Other values (6)50924
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1065545
87.4%
Uppercase Letter147225
 
12.1%
Space Separator5794
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e438778
41.2%
n153019
 
14.4%
t147225
 
13.8%
d113683
 
10.7%
s61290
 
5.8%
c36439
 
3.4%
r36439
 
3.4%
u33542
 
3.1%
g33542
 
3.1%
h5794
 
0.5%
Other values (2)5794
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
B113683
77.2%
A30645
 
20.8%
U2897
 
2.0%
Space Separator
ValueCountFrequency (%)
5794
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1212770
99.5%
Common5794
 
0.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e438778
36.2%
n153019
 
12.6%
t147225
 
12.1%
B113683
 
9.4%
d113683
 
9.4%
s61290
 
5.1%
c36439
 
3.0%
r36439
 
3.0%
u33542
 
2.8%
g33542
 
2.8%
Other values (5)45130
 
3.7%
Common
ValueCountFrequency (%)
5794
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1218564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e438778
36.0%
n153019
 
12.6%
t147225
 
12.1%
B113683
 
9.3%
d113683
 
9.3%
s61290
 
5.0%
c36439
 
3.0%
r36439
 
3.0%
u33542
 
2.8%
g33542
 
2.8%
Other values (6)50924
 
4.2%

u_ticket
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing40027
Missing (%)17.7%
Memory size221.6 KiB
Mobile-Ticket
155385 
Online-Ticket
24915 
Easy Ride
 
5904
bedienter Vertrieb
 
306

Length

Max length18
Median length13
Mean length12.88158276
Min length9

Characters and Unicode

Total characters2402544
Distinct characters22
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMobile-Ticket
2nd rowMobile-Ticket
3rd rowMobile-Ticket
4th rowMobile-Ticket
5th rowMobile-Ticket

Common Values

ValueCountFrequency (%)
Mobile-Ticket155385
68.6%
Online-Ticket24915
 
11.0%
Easy Ride5904
 
2.6%
bedienter Vertrieb306
 
0.1%
(Missing)40027
 
17.7%

Length

2022-11-18T16:34:25.553248image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:25.655767image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
mobile-ticket155385
80.6%
online-ticket24915
 
12.9%
easy5904
 
3.1%
ride5904
 
3.1%
bedienter306
 
0.2%
vertrieb306
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e368034
15.3%
i367116
15.3%
t180912
7.5%
l180300
7.5%
-180300
7.5%
T180300
7.5%
c180300
7.5%
k180300
7.5%
b155997
6.5%
M155385
6.5%
Other values (12)273600
11.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1843320
76.7%
Uppercase Letter372714
 
15.5%
Dash Punctuation180300
 
7.5%
Space Separator6210
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e368034
20.0%
i367116
19.9%
t180912
9.8%
l180300
9.8%
c180300
9.8%
k180300
9.8%
b155997
8.5%
o155385
8.4%
n50136
 
2.7%
d6210
 
0.3%
Other values (4)18630
 
1.0%
Uppercase Letter
ValueCountFrequency (%)
T180300
48.4%
M155385
41.7%
O24915
 
6.7%
E5904
 
1.6%
R5904
 
1.6%
V306
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
-180300
100.0%
Space Separator
ValueCountFrequency (%)
6210
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2216034
92.2%
Common186510
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e368034
16.6%
i367116
16.6%
t180912
8.2%
l180300
8.1%
T180300
8.1%
c180300
8.1%
k180300
8.1%
b155997
7.0%
M155385
7.0%
o155385
7.0%
Other values (10)112005
 
5.1%
Common
ValueCountFrequency (%)
-180300
96.7%
6210
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2402544
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e368034
15.3%
i367116
15.3%
t180912
7.5%
l180300
7.5%
-180300
7.5%
T180300
7.5%
c180300
7.5%
k180300
7.5%
b155997
6.5%
M155385
6.5%
Other values (12)273600
11.4%

u_fahrausweis
Categorical

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing110919
Missing (%)49.0%
Memory size221.7 KiB
Normales Billett
88564 
GA
15147 
Sparbillett
 
7995
Spartageskarte
 
3205
Tageskarte
 
424
Other values (2)
 
283

Length

Max length18
Median length16
Mean length13.7403432
Min length2

Characters and Unicode

Total characters1588631
Distinct characters30
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNormales Billett
2nd rowNormales Billett
3rd rowGA
4th rowNormales Billett
5th rowNormales Billett

Common Values

ValueCountFrequency (%)
Normales Billett88564
39.1%
GA15147
 
6.7%
Sparbillett7995
 
3.5%
Spartageskarte3205
 
1.4%
Tageskarte424
 
0.2%
Strecken-/Modulabo207
 
0.1%
seven2576
 
< 0.1%
(Missing)110919
49.0%

Length

2022-11-18T16:34:25.760863image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:25.881244image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
normales88564
43.4%
billett88564
43.4%
ga15147
 
7.4%
sparbillett7995
 
3.9%
spartageskarte3205
 
1.6%
tageskarte424
 
0.2%
strecken-/modulabo207
 
0.1%
seven2576
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
l281889
17.7%
t200159
12.6%
e192947
12.1%
a107229
 
6.7%
r103600
 
6.5%
i96559
 
6.1%
s92269
 
5.8%
o88978
 
5.6%
N88564
 
5.6%
m88564
 
5.6%
Other values (20)247873
15.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1280041
80.6%
Uppercase Letter219460
 
13.8%
Space Separator88564
 
5.6%
Dash Punctuation207
 
< 0.1%
Other Punctuation207
 
< 0.1%
Decimal Number152
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l281889
22.0%
t200159
15.6%
e192947
15.1%
a107229
 
8.4%
r103600
 
8.1%
i96559
 
7.5%
s92269
 
7.2%
o88978
 
7.0%
m88564
 
6.9%
p11200
 
0.9%
Other values (8)16647
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
N88564
40.4%
B88564
40.4%
G15147
 
6.9%
A15147
 
6.9%
S11407
 
5.2%
T424
 
0.2%
M207
 
0.1%
Decimal Number
ValueCountFrequency (%)
276
50.0%
576
50.0%
Space Separator
ValueCountFrequency (%)
88564
100.0%
Dash Punctuation
ValueCountFrequency (%)
-207
100.0%
Other Punctuation
ValueCountFrequency (%)
/207
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1499501
94.4%
Common89130
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
l281889
18.8%
t200159
13.3%
e192947
12.9%
a107229
 
7.2%
r103600
 
6.9%
i96559
 
6.4%
s92269
 
6.2%
o88978
 
5.9%
N88564
 
5.9%
m88564
 
5.9%
Other values (15)158743
10.6%
Common
ValueCountFrequency (%)
88564
99.4%
-207
 
0.2%
/207
 
0.2%
276
 
0.1%
576
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1588631
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l281889
17.7%
t200159
12.6%
e192947
12.1%
a107229
 
6.7%
r103600
 
6.5%
i96559
 
6.1%
s92269
 
5.8%
o88978
 
5.6%
N88564
 
5.6%
m88564
 
5.6%
Other values (20)247873
15.6%

u_preis
Real number (ℝ≥0)

MISSING

Distinct1447
Distinct (%)0.7%
Missing21482
Missing (%)9.5%
Infinite0
Infinite (%)0.0%
Mean33.52140499
Minimum0
Maximum6300
Zeros40
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:26.004560image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3.8
Q18.1
median15.6
Q328.9
95-th percentile60.6
Maximum6300
Range6300
Interquartile range (IQR)20.8

Descriptive statistics

Standard deviation205.6276614
Coefficient of variation (CV)6.134219656
Kurtosis442.8823229
Mean33.52140499
Median Absolute Deviation (MAD)8.9
Skewness19.86197468
Sum6873731.7
Variance42282.73512
MonotonicityNot monotonic
2022-11-18T16:34:26.129418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
134852
 
2.1%
6.84778
 
2.1%
3.43963
 
1.7%
8.83027
 
1.3%
6.22830
 
1.2%
5.82425
 
1.1%
282393
 
1.1%
3.72357
 
1.0%
142340
 
1.0%
10.82337
 
1.0%
Other values (1437)173753
76.7%
(Missing)21482
 
9.5%
ValueCountFrequency (%)
040
 
< 0.1%
0.52
 
< 0.1%
1.32
 
< 0.1%
1.410
 
< 0.1%
1.57
 
< 0.1%
1.72
 
< 0.1%
2303
0.1%
2.14
 
< 0.1%
2.2384
0.2%
2.34
 
< 0.1%
ValueCountFrequency (%)
630055
 
< 0.1%
484022
 
< 0.1%
45201
 
< 0.1%
434011
 
< 0.1%
40504
 
< 0.1%
3860200
0.1%
35201
 
< 0.1%
2880130
0.1%
270075
 
< 0.1%
2650110
< 0.1%

R_zweck
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing5355
Missing (%)2.4%
Memory size221.5 KiB
Freizeit und Unterhaltung
147614 
Arbeit und Lernen
58804 
Sonstige
14764 

Length

Max length25
Median length25
Mean length21.73834218
Min length8

Characters and Unicode

Total characters4808130
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFreizeit und Unterhaltung
2nd rowFreizeit und Unterhaltung
3rd rowFreizeit und Unterhaltung
4th rowFreizeit und Unterhaltung
5th rowFreizeit und Unterhaltung

Common Values

ValueCountFrequency (%)
Freizeit und Unterhaltung147614
65.2%
Arbeit und Lernen58804
 
26.0%
Sonstige14764
 
6.5%
(Missing)5355
 
2.4%

Length

2022-11-18T16:34:26.251842image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-11-18T16:34:26.371315image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
und206418
32.6%
freizeit147614
23.3%
unterhaltung147614
23.3%
arbeit58804
 
9.3%
lernen58804
 
9.3%
sonstige14764
 
2.3%

Most occurring characters

ValueCountFrequency (%)
e634018
13.2%
n634018
13.2%
t516410
10.7%
412836
8.6%
r412836
8.6%
i368796
 
7.7%
u354032
 
7.4%
d206418
 
4.3%
g162378
 
3.4%
F147614
 
3.1%
Other values (11)958774
19.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3967694
82.5%
Uppercase Letter427600
 
8.9%
Space Separator412836
 
8.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e634018
16.0%
n634018
16.0%
t516410
13.0%
r412836
10.4%
i368796
9.3%
u354032
8.9%
d206418
 
5.2%
g162378
 
4.1%
a147614
 
3.7%
l147614
 
3.7%
Other values (5)383560
9.7%
Uppercase Letter
ValueCountFrequency (%)
F147614
34.5%
U147614
34.5%
A58804
 
13.8%
L58804
 
13.8%
S14764
 
3.5%
Space Separator
ValueCountFrequency (%)
412836
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin4395294
91.4%
Common412836
 
8.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e634018
14.4%
n634018
14.4%
t516410
11.7%
r412836
9.4%
i368796
8.4%
u354032
8.1%
d206418
 
4.7%
g162378
 
3.7%
F147614
 
3.4%
a147614
 
3.4%
Other values (10)811160
18.5%
Common
ValueCountFrequency (%)
412836
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4808130
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e634018
13.2%
n634018
13.2%
t516410
10.7%
412836
8.6%
r412836
8.6%
i368796
 
7.7%
u354032
 
7.4%
d206418
 
4.3%
g162378
 
3.4%
F147614
 
3.1%
Other values (11)958774
19.9%

ft_abfahrt
Date

MISSING

Distinct1316
Distinct (%)0.7%
Missing38301
Missing (%)16.9%
Memory size1.7 MiB
Minimum2022-11-18 00:00:00
Maximum2022-11-18 23:59:00
2022-11-18T16:34:26.484153image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:34:26.615940image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

ft_ankunft
Date

MISSING

Distinct1326
Distinct (%)0.7%
Missing38301
Missing (%)16.9%
Memory size1.7 MiB
Minimum2022-11-18 00:00:00
Maximum2022-11-18 23:59:00
2022-11-18T16:34:26.758188image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:34:26.904660image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

ft_startort_uic
Real number (ℝ≥0)

MISSING
SKEWED

Distinct2363
Distinct (%)1.3%
Missing38301
Missing (%)16.9%
Infinite0
Infinite (%)0.0%
Mean8508862.205
Minimum1101961
Maximum8891702
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:27.044448image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1101961
5-th percentile8500023
Q18503000
median8504300
Q38507000
95-th percentile8576333
Maximum8891702
Range7789741
Interquartile range (IQR)4000

Descriptive statistics

Standard deviation51945.08393
Coefficient of variation (CV)0.006104821382
Kurtosis4650.318474
Mean8508862.205
Median Absolute Deviation (MAD)2150
Skewness-38.85345395
Sum1.601674186 × 1012
Variance2698291745
MonotonicityNot monotonic
2022-11-18T16:34:27.170131image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850300020030
 
8.8%
850700011187
 
4.9%
85000107302
 
3.2%
85050006770
 
3.0%
85030165553
 
2.5%
85002185215
 
2.3%
85060004188
 
1.8%
85063023042
 
1.3%
85090002394
 
1.1%
85021132352
 
1.0%
Other values (2353)120203
53.1%
(Missing)38301
 
16.9%
ValueCountFrequency (%)
11019612
 
< 0.1%
51038651
 
< 0.1%
55100172
 
< 0.1%
800107118
< 0.1%
80010931
 
< 0.1%
80020841
 
< 0.1%
80021401
 
< 0.1%
80021813
 
< 0.1%
80022533
 
< 0.1%
80023012
 
< 0.1%
ValueCountFrequency (%)
88917021
 
< 0.1%
88120051
 
< 0.1%
87751001
 
< 0.1%
87746871
 
< 0.1%
87745493
 
< 0.1%
87715131
 
< 0.1%
87713044
 
< 0.1%
87182064
 
< 0.1%
859600410
< 0.1%
85959311
 
< 0.1%

ft_tu
Categorical

HIGH CARDINALITY
MISSING

Distinct55
Distinct (%)< 0.1%
Missing38301
Missing (%)16.9%
Memory size223.8 KiB
SBB
146760 
BLS
 
10454
SOB
 
7407
THU
 
6928
RhB
 
4434
Other values (50)
 
12253

Length

Max length3
Median length3
Mean length2.972789477
Min length1

Characters and Unicode

Total characters559586
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowSBB
2nd rowSBB
3rd rowSBB
4th rowSBB
5th rowSBB

Common Values

ValueCountFrequency (%)
SBB146760
64.8%
BLS10454
 
4.6%
SOB7407
 
3.3%
THU6928
 
3.1%
RhB4434
 
2.0%
ZB3404
 
1.5%
MGB1712
 
0.8%
AVA1121
 
0.5%
RBS1022
 
0.5%
AB-985
 
0.4%
Other values (45)4009
 
1.8%
(Missing)38301
 
16.9%

Length

2022-11-18T16:34:27.299420image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
sbb146760
78.0%
bls10454
 
5.6%
sob7407
 
3.9%
thu6928
 
3.7%
rhb4434
 
2.4%
zb3406
 
1.8%
mgb1712
 
0.9%
ava1121
 
0.6%
rbs1022
 
0.5%
ab985
 
0.5%
Other values (43)4007
 
2.1%

Most occurring characters

ValueCountFrequency (%)
B324882
58.1%
S166601
29.8%
L10580
 
1.9%
O7936
 
1.4%
U7504
 
1.3%
T7392
 
1.3%
H6928
 
1.2%
R6200
 
1.1%
h4434
 
0.8%
A4343
 
0.8%
Other values (21)12786
 
2.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter554041
99.0%
Lowercase Letter4551
 
0.8%
Dash Punctuation987
 
0.2%
Connector Punctuation7
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B324882
58.6%
S166601
30.1%
L10580
 
1.9%
O7936
 
1.4%
U7504
 
1.4%
T7392
 
1.3%
H6928
 
1.3%
R6200
 
1.1%
A4343
 
0.8%
Z3982
 
0.7%
Other values (15)7693
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
h4434
97.4%
e65
 
1.4%
r42
 
0.9%
t10
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
-987
100.0%
Connector Punctuation
ValueCountFrequency (%)
_7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin558592
99.8%
Common994
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
B324882
58.2%
S166601
29.8%
L10580
 
1.9%
O7936
 
1.4%
U7504
 
1.3%
T7392
 
1.3%
H6928
 
1.2%
R6200
 
1.1%
h4434
 
0.8%
A4343
 
0.8%
Other values (19)11792
 
2.1%
Common
ValueCountFrequency (%)
-987
99.3%
_7
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII559536
> 99.9%
None50
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B324882
58.1%
S166601
29.8%
L10580
 
1.9%
O7936
 
1.4%
U7504
 
1.3%
T7392
 
1.3%
H6928
 
1.2%
R6200
 
1.1%
h4434
 
0.8%
A4343
 
0.8%
Other values (19)12736
 
2.3%
None
ValueCountFrequency (%)
Ö49
98.0%
Ü1
 
2.0%

ft_vm
Categorical

HIGH CARDINALITY
MISSING

Distinct18232
Distinct (%)9.7%
Missing38301
Missing (%)16.9%
Memory size1.1 MiB
IC 5
 
578
IC 8
 
462
IC 8 808
 
437
IC 1
 
416
IC 8 825
 
405
Other values (18227)
185938 

Length

Max length14
Median length13
Mean length8.92787777
Min length3

Characters and Unicode

Total characters1680548
Distinct characters23
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6262 ?
Unique (%)3.3%

Sample

1st rowIC-51-1629
2nd rowIC-1-722
3rd rowIC-6-1076
4th rowIC-3-580
5th rowRE--4818

Common Values

ValueCountFrequency (%)
IC 5578
 
0.3%
IC 8462
 
0.2%
IC 8 808437
 
0.2%
IC 1416
 
0.2%
IC 8 825405
 
0.2%
IC 8 827403
 
0.2%
IC 8 829379
 
0.2%
IC 8 826362
 
0.2%
IC 8 810358
 
0.2%
IC 8 806339
 
0.1%
Other values (18222)184097
81.3%
(Missing)38301
 
16.9%

Length

2022-11-18T16:34:27.417256image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
s53411
 
11.1%
ic43969
 
9.1%
ir40123
 
8.3%
513767
 
2.9%
812735
 
2.6%
111067
 
2.3%
310270
 
2.1%
re8794
 
1.8%
756602
 
1.4%
r5513
 
1.1%
Other values (14443)274528
57.1%

Most occurring characters

ValueCountFrequency (%)
309520
18.4%
1196200
11.7%
2150803
9.0%
I109810
 
6.5%
5108139
 
6.4%
3101937
 
6.1%
691891
 
5.5%
887046
 
5.2%
783765
 
5.0%
R67627
 
4.0%
Other values (13)373810
22.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number994217
59.2%
Uppercase Letter316905
 
18.9%
Space Separator309520
 
18.4%
Dash Punctuation59906
 
3.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I109810
34.7%
R67627
21.3%
C64039
20.2%
S56077
17.7%
E17690
 
5.6%
G511
 
0.2%
T500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L33
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1196200
19.7%
2150803
15.2%
5108139
10.9%
3101937
10.3%
691891
9.2%
887046
8.8%
783765
8.4%
463272
 
6.4%
056052
 
5.6%
955112
 
5.5%
Space Separator
ValueCountFrequency (%)
309520
100.0%
Dash Punctuation
ValueCountFrequency (%)
-59906
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1363643
81.1%
Latin316905
 
18.9%

Most frequent character per script

Common
ValueCountFrequency (%)
309520
22.7%
1196200
14.4%
2150803
11.1%
5108139
 
7.9%
3101937
 
7.5%
691891
 
6.7%
887046
 
6.4%
783765
 
6.1%
463272
 
4.6%
-59906
 
4.4%
Other values (2)111164
 
8.2%
Latin
ValueCountFrequency (%)
I109810
34.7%
R67627
21.3%
C64039
20.2%
S56077
17.7%
E17690
 
5.6%
G511
 
0.2%
T500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L33
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1680548
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
309520
18.4%
1196200
11.7%
2150803
9.0%
I109810
 
6.5%
5108139
 
6.4%
3101937
 
6.1%
691891
 
5.5%
887046
 
5.2%
783765
 
5.0%
R67627
 
4.0%
Other values (13)373810
22.2%

ft_vm_kurz
Categorical

MISSING

Distinct12
Distinct (%)< 0.1%
Missing38301
Missing (%)16.9%
Memory size221.7 KiB
IC
56317 
S
55968 
IR
50859 
RE
9915 
R
6813 
Other values (7)
8364 

Length

Max length3
Median length2
Mean length1.683126501
Min length1

Characters and Unicode

Total characters316825
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowIC
2nd rowIC
3rd rowIC
4th rowIC
5th rowRE

Common Values

ValueCountFrequency (%)
IC56317
24.9%
S55968
24.7%
IR50859
22.5%
RE9915
 
4.4%
R6813
 
3.0%
EC5113
 
2.3%
ICE2609
 
1.2%
TGV500
 
0.2%
SN108
 
< 0.1%
IRE25
 
< 0.1%
Other values (2)9
 
< 0.1%
(Missing)38301
16.9%

Length

2022-11-18T16:34:27.528665image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ic56317
29.9%
s55968
29.7%
ir50859
27.0%
re9915
 
5.3%
r6813
 
3.6%
ec5113
 
2.7%
ice2609
 
1.4%
tgv500
 
0.3%
sn108
 
0.1%
ire25
 
< 0.1%
Other values (2)9
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
I109810
34.7%
R67612
21.3%
C64039
20.2%
S56077
17.7%
E17670
 
5.6%
T500
 
0.2%
G500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L1
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter316825
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
I109810
34.7%
R67612
21.3%
C64039
20.2%
S56077
17.7%
E17670
 
5.6%
T500
 
0.2%
G500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L1
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin316825
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
I109810
34.7%
R67612
21.3%
C64039
20.2%
S56077
17.7%
E17670
 
5.6%
T500
 
0.2%
G500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII316825
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
I109810
34.7%
R67612
21.3%
C64039
20.2%
S56077
17.7%
E17670
 
5.6%
T500
 
0.2%
G500
 
0.2%
V500
 
0.2%
N116
 
< 0.1%
L1
 
< 0.1%

ft_zielort_uic
Real number (ℝ≥0)

MISSING

Distinct1464
Distinct (%)0.8%
Missing38301
Missing (%)16.9%
Infinite0
Infinite (%)0.0%
Mean8500531.918
Minimum5501362
Maximum8814001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:27.649099image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum5501362
5-th percentile8500023
Q18503000
median8503127
Q38506290
95-th percentile8509000
Maximum8814001
Range3312639
Interquartile range (IQR)3290

Descriptive statistics

Standard deviation42794.67661
Coefficient of variation (CV)0.005034352793
Kurtosis611.7592452
Mean8500531.918
Median Absolute Deviation (MAD)1873
Skewness-16.80357096
Sum1.600106126 × 1012
Variance1831384346
MonotonicityNot monotonic
2022-11-18T16:34:27.771788image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850300035107
 
15.5%
850700017066
 
7.5%
85050009928
 
4.4%
85002188140
 
3.6%
85000107156
 
3.2%
85060005432
 
2.4%
85063024604
 
2.0%
85030163756
 
1.7%
85090003721
 
1.6%
85021132831
 
1.2%
Other values (1454)90495
39.9%
(Missing)38301
16.9%
ValueCountFrequency (%)
55013621
 
< 0.1%
55100173
 
< 0.1%
800107115
< 0.1%
80010931
 
< 0.1%
80021401
 
< 0.1%
80021813
 
< 0.1%
80022533
 
< 0.1%
80023012
 
< 0.1%
80023071
 
< 0.1%
80023717
< 0.1%
ValueCountFrequency (%)
88140011
 
< 0.1%
87746471
 
< 0.1%
87746001
 
< 0.1%
87745496
< 0.1%
87745381
 
< 0.1%
87725681
 
< 0.1%
87723191
 
< 0.1%
87715131
 
< 0.1%
87713045
< 0.1%
87688881
 
< 0.1%

fg_abfahrt
Date

MISSING

Distinct1340
Distinct (%)0.7%
Missing27467
Missing (%)12.1%
Memory size1.7 MiB
Minimum2022-11-18 00:00:00
Maximum2022-11-18 23:59:00
2022-11-18T16:34:27.902714image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:34:28.029204image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

fg_ankunft
Date

MISSING

Distinct1363
Distinct (%)0.7%
Missing27467
Missing (%)12.1%
Memory size1.7 MiB
Minimum2022-11-18 00:00:00
Maximum2022-11-18 23:59:00
2022-11-18T16:34:28.165622image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:34:28.291513image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

fg_startort_uic
Real number (ℝ≥0)

MISSING
SKEWED

Distinct10832
Distinct (%)5.4%
Missing27467
Missing (%)12.1%
Infinite0
Infinite (%)0.0%
Mean8510008.922
Minimum1101316
Maximum8891702
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:28.431958image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1101316
5-th percentile8500026
Q18503000
median8505000
Q38507555
95-th percentile8587771
Maximum8891702
Range7790386
Interquartile range (IQR)4555

Descriptive statistics

Standard deviation108047.7434
Coefficient of variation (CV)0.01269654878
Kurtosis3650.066826
Mean8510008.922
Median Absolute Deviation (MAD)2172
Skewness-54.12944026
Sum1.694087476 × 1012
Variance1.167431486 × 1010
MonotonicityNot monotonic
2022-11-18T16:34:28.564463image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850300013118
 
5.8%
85070008257
 
3.6%
85000106758
 
3.0%
85030165753
 
2.5%
85050005707
 
2.5%
85060003269
 
1.4%
85002182541
 
1.1%
85063022478
 
1.1%
85021132004
 
0.9%
85071001934
 
0.9%
Other values (10822)147251
65.0%
(Missing)27467
 
12.1%
ValueCountFrequency (%)
11013161
 
< 0.1%
11013271
 
< 0.1%
11019541
 
< 0.1%
11019571
 
< 0.1%
11020341
 
< 0.1%
11049351
 
< 0.1%
110649326
< 0.1%
14018101
 
< 0.1%
51038651
 
< 0.1%
55100172
 
< 0.1%
ValueCountFrequency (%)
88917021
 
< 0.1%
88120051
 
< 0.1%
87763021
 
< 0.1%
87751001
 
< 0.1%
87746872
 
< 0.1%
87746003
< 0.1%
87745642
 
< 0.1%
87745493
< 0.1%
87742321
 
< 0.1%
87723197
< 0.1%

fg_zielort_uic
Real number (ℝ≥0)

MISSING
SKEWED

Distinct10607
Distinct (%)5.3%
Missing27467
Missing (%)12.1%
Infinite0
Infinite (%)0.0%
Mean8507925.039
Minimum1101322
Maximum8831138
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.7 MiB
2022-11-18T16:34:28.971530image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1101322
5-th percentile8500023
Q18503000
median8505000
Q38507483
95-th percentile8588604.65
Maximum8831138
Range7729816
Interquartile range (IQR)4483

Descriptive statistics

Standard deviation124982.7345
Coefficient of variation (CV)0.01469015464
Kurtosis2744.585839
Mean8507925.039
Median Absolute Deviation (MAD)2054
Skewness-47.23386188
Sum1.693672637 × 1012
Variance1.562068392 × 1010
MonotonicityNot monotonic
2022-11-18T16:34:29.091200image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
850300018046
 
8.0%
85070009965
 
4.4%
85050006822
 
3.0%
85000106583
 
2.9%
85030164868
 
2.1%
85060004112
 
1.8%
85063023530
 
1.6%
85002182646
 
1.2%
85021132457
 
1.1%
85071001940
 
0.9%
Other values (10597)138101
61.0%
(Missing)27467
 
12.1%
ValueCountFrequency (%)
11013221
 
< 0.1%
11013231
 
< 0.1%
11015351
 
< 0.1%
11018942
 
< 0.1%
11019573
 
< 0.1%
11019662
 
< 0.1%
11021382
 
< 0.1%
11025021
 
< 0.1%
110649322
< 0.1%
11100001
 
< 0.1%
ValueCountFrequency (%)
88311381
 
< 0.1%
88140011
 
< 0.1%
87747002
 
< 0.1%
87746871
 
< 0.1%
87746471
 
< 0.1%
87746001
 
< 0.1%
87745641
 
< 0.1%
87745498
< 0.1%
87745381
 
< 0.1%
87745001
 
< 0.1%

fg_startort
Categorical

HIGH CARDINALITY
MISSING

Distinct12165
Distinct (%)5.5%
Missing6590
Missing (%)2.9%
Memory size796.1 KiB
Zürich HB
 
14097
Bern
 
8847
Basel SBB
 
7348
Luzern
 
6125
Zürich Flughafen
 
6109
Other values (12160)
177421 

Length

Max length33
Median length26
Mean length11.29004715
Min length1

Characters and Unicode

Total characters2483212
Distinct characters91
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3874 ?
Unique (%)1.8%

Sample

1st rowBiel/Bienne
2nd rowAarau
3rd rowZweilütschinen
4th rowChur
5th rowBuchs SG, Werdenberg

Common Values

ValueCountFrequency (%)
Zürich HB14097
 
6.2%
Bern8847
 
3.9%
Basel SBB7348
 
3.2%
Luzern6125
 
2.7%
Zürich Flughafen6109
 
2.7%
Winterthur3565
 
1.6%
Olten2762
 
1.2%
St. Gallen2623
 
1.2%
Aarau2123
 
0.9%
Thun2049
 
0.9%
Other values (12155)164299
72.5%
(Missing)6590
 
2.9%

Length

2022-11-18T16:34:29.228526image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
zürich27577
 
8.1%
hb14330
 
4.2%
bern10897
 
3.2%
basel8785
 
2.6%
sbb7628
 
2.2%
luzern7017
 
2.1%
flughafen6239
 
1.8%
st4963
 
1.5%
winterthur4919
 
1.4%
dorf4213
 
1.2%
Other values (9201)243073
71.6%

Most occurring characters

ValueCountFrequency (%)
e243874
 
9.8%
n194238
 
7.8%
r177962
 
7.2%
i136920
 
5.5%
a136428
 
5.5%
119753
 
4.8%
l118248
 
4.8%
t110825
 
4.5%
h107515
 
4.3%
s93495
 
3.8%
Other values (81)1043954
42.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1882545
75.8%
Uppercase Letter412384
 
16.6%
Space Separator119753
 
4.8%
Other Punctuation54939
 
2.2%
Dash Punctuation9796
 
0.4%
Open Punctuation1796
 
0.1%
Close Punctuation1796
 
0.1%
Decimal Number196
 
< 0.1%
Math Symbol7
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e243874
13.0%
n194238
10.3%
r177962
 
9.5%
i136920
 
7.3%
a136428
 
7.2%
l118248
 
6.3%
t110825
 
5.9%
h107515
 
5.7%
s93495
 
5.0%
o78594
 
4.2%
Other values (30)484446
25.7%
Uppercase Letter
ValueCountFrequency (%)
B80361
19.5%
S48066
11.7%
Z39198
 
9.5%
H28563
 
6.9%
L23149
 
5.6%
G22819
 
5.5%
W19911
 
4.8%
A18211
 
4.4%
F15861
 
3.8%
R14511
 
3.5%
Other values (23)101734
24.7%
Decimal Number
ValueCountFrequency (%)
474
37.8%
631
15.8%
326
 
13.3%
025
 
12.8%
117
 
8.7%
212
 
6.1%
710
 
5.1%
81
 
0.5%
Other Punctuation
ValueCountFrequency (%)
,42693
77.7%
.7625
 
13.9%
/4475
 
8.1%
'145
 
0.3%
&1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
119753
100.0%
Dash Punctuation
ValueCountFrequency (%)
-9796
100.0%
Open Punctuation
ValueCountFrequency (%)
(1796
100.0%
Close Punctuation
ValueCountFrequency (%)
)1796
100.0%
Math Symbol
ValueCountFrequency (%)
+7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2294929
92.4%
Common188283
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e243874
 
10.6%
n194238
 
8.5%
r177962
 
7.8%
i136920
 
6.0%
a136428
 
5.9%
l118248
 
5.2%
t110825
 
4.8%
h107515
 
4.7%
s93495
 
4.1%
B80361
 
3.5%
Other values (63)895063
39.0%
Common
ValueCountFrequency (%)
119753
63.6%
,42693
 
22.7%
-9796
 
5.2%
.7625
 
4.0%
/4475
 
2.4%
(1796
 
1.0%
)1796
 
1.0%
'145
 
0.1%
474
 
< 0.1%
631
 
< 0.1%
Other values (8)99
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2427453
97.8%
None55759
 
2.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e243874
 
10.0%
n194238
 
8.0%
r177962
 
7.3%
i136920
 
5.6%
a136428
 
5.6%
119753
 
4.9%
l118248
 
4.9%
t110825
 
4.6%
h107515
 
4.4%
s93495
 
3.9%
Other values (60)988195
40.7%
None
ValueCountFrequency (%)
ü41179
73.9%
ä7338
 
13.2%
ö3826
 
6.9%
è1286
 
2.3%
é1067
 
1.9%
â464
 
0.8%
Ü361
 
0.6%
Ä91
 
0.2%
ô40
 
0.1%
Ö33
 
0.1%
Other values (11)74
 
0.1%

fg_zielort
Categorical

HIGH CARDINALITY
MISSING

Distinct11995
Distinct (%)5.5%
Missing6587
Missing (%)2.9%
Memory size794.9 KiB
Zürich HB
19401 
Bern
 
10621
Luzern
 
7333
Basel SBB
 
7220
Zürich Flughafen
 
5167
Other values (11990)
170208 

Length

Max length40
Median length28
Mean length11.20180496
Min length2

Characters and Unicode

Total characters2463837
Distinct characters95
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4131 ?
Unique (%)1.9%

Sample

1st rowAesch BL
2nd rowBern
3rd rowBern Wankdorf
4th rowZürich HB
5th rowMuhen Nord

Common Values

ValueCountFrequency (%)
Zürich HB19401
 
8.6%
Bern10621
 
4.7%
Luzern7333
 
3.2%
Basel SBB7220
 
3.2%
Zürich Flughafen5167
 
2.3%
Winterthur4419
 
2.0%
St. Gallen3720
 
1.6%
Olten2855
 
1.3%
Aarau2636
 
1.2%
Thun2068
 
0.9%
Other values (11985)154510
68.2%
(Missing)6587
 
2.9%

Length

2022-11-18T16:34:29.366817image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
zürich37553
 
10.9%
hb19765
 
5.7%
bern13468
 
3.9%
basel8810
 
2.6%
luzern8663
 
2.5%
sbb7463
 
2.2%
st6514
 
1.9%
winterthur5426
 
1.6%
flughafen5382
 
1.6%
gallen5214
 
1.5%
Other values (9171)226935
65.7%

Most occurring characters

ValueCountFrequency (%)
e230818
 
9.4%
n185671
 
7.5%
r183738
 
7.5%
i137653
 
5.6%
a136482
 
5.5%
125344
 
5.1%
l115711
 
4.7%
h113152
 
4.6%
t107682
 
4.4%
s88237
 
3.6%
Other values (85)1039349
42.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1849570
75.1%
Uppercase Letter421239
 
17.1%
Space Separator125344
 
5.1%
Other Punctuation55196
 
2.2%
Dash Punctuation8259
 
0.3%
Open Punctuation1992
 
0.1%
Close Punctuation1992
 
0.1%
Decimal Number226
 
< 0.1%
Math Symbol19
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e230818
12.5%
n185671
 
10.0%
r183738
 
9.9%
i137653
 
7.4%
a136482
 
7.4%
l115711
 
6.3%
h113152
 
6.1%
t107682
 
5.8%
s88237
 
4.8%
u76195
 
4.1%
Other values (33)474231
25.6%
Uppercase Letter
ValueCountFrequency (%)
B87549
20.8%
Z48001
11.4%
S47582
11.3%
H32296
 
7.7%
L24106
 
5.7%
G22329
 
5.3%
W18370
 
4.4%
A17701
 
4.2%
F14158
 
3.4%
R13358
 
3.2%
Other values (22)95789
22.7%
Decimal Number
ValueCountFrequency (%)
476
33.6%
330
 
13.3%
626
 
11.5%
124
 
10.6%
022
 
9.7%
221
 
9.3%
719
 
8.4%
84
 
1.8%
92
 
0.9%
52
 
0.9%
Other Punctuation
ValueCountFrequency (%)
,40972
74.2%
.8778
 
15.9%
/5328
 
9.7%
'116
 
0.2%
&2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
125344
100.0%
Dash Punctuation
ValueCountFrequency (%)
-8259
100.0%
Open Punctuation
ValueCountFrequency (%)
(1992
100.0%
Close Punctuation
ValueCountFrequency (%)
)1992
100.0%
Math Symbol
ValueCountFrequency (%)
+19
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2270809
92.2%
Common193028
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e230818
 
10.2%
n185671
 
8.2%
r183738
 
8.1%
i137653
 
6.1%
a136482
 
6.0%
l115711
 
5.1%
h113152
 
5.0%
t107682
 
4.7%
s88237
 
3.9%
B87549
 
3.9%
Other values (65)884116
38.9%
Common
ValueCountFrequency (%)
125344
64.9%
,40972
 
21.2%
.8778
 
4.5%
-8259
 
4.3%
/5328
 
2.8%
(1992
 
1.0%
)1992
 
1.0%
'116
 
0.1%
476
 
< 0.1%
330
 
< 0.1%
Other values (10)141
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2401727
97.5%
None62110
 
2.5%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e230818
 
9.6%
n185671
 
7.7%
r183738
 
7.7%
i137653
 
5.7%
a136482
 
5.7%
125344
 
5.2%
l115711
 
4.8%
h113152
 
4.7%
t107682
 
4.5%
s88237
 
3.7%
Other values (62)977239
40.7%
None
ValueCountFrequency (%)
ü49122
79.1%
ä6154
 
9.9%
ö3475
 
5.6%
è1253
 
2.0%
é993
 
1.6%
â456
 
0.7%
Ü424
 
0.7%
ô53
 
0.1%
Ä51
 
0.1%
à28
 
< 0.1%
Other values (13)101
 
0.2%

ft_startort
Categorical

HIGH CARDINALITY
MISSING

Distinct6252
Distinct (%)3.0%
Missing17466
Missing (%)7.7%
Memory size620.7 KiB
Zürich HB
21001 
Bern
 
11773
Basel SBB
 
7889
Luzern
 
7186
Zürich Flughafen
 
5908
Other values (6247)
155314 

Length

Max length31
Median length27
Mean length10.31267847
Min length1

Characters and Unicode

Total characters2156082
Distinct characters88
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2888 ?
Unique (%)1.4%

Sample

1st rowBiel/Bienne
2nd rowAarau
3rd rowSpiez
4th rowChur
5th rowZürich HB

Common Values

ValueCountFrequency (%)
Zürich HB21001
 
9.3%
Bern11773
 
5.2%
Basel SBB7889
 
3.5%
Luzern7186
 
3.2%
Zürich Flughafen5908
 
2.6%
Olten5434
 
2.4%
Winterthur4483
 
2.0%
St. Gallen3187
 
1.4%
Chur2544
 
1.1%
Zug2479
 
1.1%
Other values (6242)137187
60.6%
(Missing)17466
 
7.7%

Length

2022-11-18T16:34:29.502649image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
zürich33357
 
10.6%
hb21549
 
6.8%
bahnhof20401
 
6.5%
bern13779
 
4.4%
basel9514
 
3.0%
sbb9240
 
2.9%
luzern8363
 
2.7%
flughafen6340
 
2.0%
olten5833
 
1.9%
winterthur5737
 
1.8%
Other values (4850)180743
57.4%

Most occurring characters

ValueCountFrequency (%)
e182563
 
8.5%
n179967
 
8.3%
r147891
 
6.9%
h134818
 
6.3%
a126702
 
5.9%
B111113
 
5.2%
i110964
 
5.1%
105809
 
4.9%
l92767
 
4.3%
t83496
 
3.9%
Other values (78)879992
40.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1599255
74.2%
Uppercase Letter398304
 
18.5%
Space Separator105809
 
4.9%
Other Punctuation43210
 
2.0%
Dash Punctuation7572
 
0.4%
Open Punctuation886
 
< 0.1%
Close Punctuation886
 
< 0.1%
Decimal Number99
 
< 0.1%
Math Symbol61
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e182563
11.4%
n179967
11.3%
r147891
 
9.2%
h134818
 
8.4%
a126702
 
7.9%
i110964
 
6.9%
l92767
 
5.8%
t83496
 
5.2%
o73765
 
4.6%
u72631
 
4.5%
Other values (28)393691
24.6%
Uppercase Letter
ValueCountFrequency (%)
B111113
27.9%
Z43840
 
11.0%
S43054
 
10.8%
H32790
 
8.2%
L21822
 
5.5%
G19446
 
4.9%
W17562
 
4.4%
A13864
 
3.5%
F13854
 
3.5%
O12052
 
3.0%
Other values (23)68907
17.3%
Decimal Number
ValueCountFrequency (%)
444
44.4%
612
 
12.1%
111
 
11.1%
311
 
11.1%
710
 
10.1%
210
 
10.1%
01
 
1.0%
Other Punctuation
ValueCountFrequency (%)
,30545
70.7%
.6543
 
15.1%
/6068
 
14.0%
'31
 
0.1%
&23
 
0.1%
Space Separator
ValueCountFrequency (%)
105809
100.0%
Dash Punctuation
ValueCountFrequency (%)
-7572
100.0%
Open Punctuation
ValueCountFrequency (%)
(886
100.0%
Close Punctuation
ValueCountFrequency (%)
)886
100.0%
Math Symbol
ValueCountFrequency (%)
+61
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1997559
92.6%
Common158523
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e182563
 
9.1%
n179967
 
9.0%
r147891
 
7.4%
h134818
 
6.7%
a126702
 
6.3%
B111113
 
5.6%
i110964
 
5.6%
l92767
 
4.6%
t83496
 
4.2%
o73765
 
3.7%
Other values (61)753513
37.7%
Common
ValueCountFrequency (%)
105809
66.7%
,30545
 
19.3%
-7572
 
4.8%
.6543
 
4.1%
/6068
 
3.8%
(886
 
0.6%
)886
 
0.6%
+61
 
< 0.1%
444
 
< 0.1%
'31
 
< 0.1%
Other values (7)78
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2104889
97.6%
None51193
 
2.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e182563
 
8.7%
n179967
 
8.5%
r147891
 
7.0%
h134818
 
6.4%
a126702
 
6.0%
B111113
 
5.3%
i110964
 
5.3%
105809
 
5.0%
l92767
 
4.4%
t83496
 
4.0%
Other values (59)828799
39.4%
None
ValueCountFrequency (%)
ü41974
82.0%
ä4667
 
9.1%
ö1508
 
2.9%
è1154
 
2.3%
é867
 
1.7%
â515
 
1.0%
Ü360
 
0.7%
Ä79
 
0.2%
Ö23
 
< 0.1%
ì14
 
< 0.1%
Other values (9)32
 
0.1%

ft_zielort
Categorical

HIGH CARDINALITY
MISSING

Distinct5341
Distinct (%)2.6%
Missing17463
Missing (%)7.7%
Memory size613.7 KiB
Zürich HB
36448 
Bern
17714 
Luzern
 
10435
Olten
 
8345
Basel SBB
 
7792
Other values (5336)
128340 

Length

Max length40
Median length29
Mean length8.80995724
Min length3

Characters and Unicode

Total characters1841933
Distinct characters85
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2790 ?
Unique (%)1.3%

Sample

1st rowLaufen
2nd rowBern
3rd rowBern
4th rowZürich HB
5th rowAarau

Common Values

ValueCountFrequency (%)
Zürich HB36448
 
16.1%
Bern17714
 
7.8%
Luzern10435
 
4.6%
Olten8345
 
3.7%
Basel SBB7792
 
3.4%
Winterthur5739
 
2.5%
St. Gallen4794
 
2.1%
Zürich Flughafen4055
 
1.8%
Chur3866
 
1.7%
Aarau3009
 
1.3%
Other values (5331)106877
47.2%
(Missing)17463
 
7.7%

Length

2022-11-18T16:34:29.634743image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
zürich49499
 
16.7%
hb36985
 
12.5%
bern18628
 
6.3%
luzern10752
 
3.6%
olten8433
 
2.8%
basel8217
 
2.8%
sbb7983
 
2.7%
st6064
 
2.0%
winterthur5999
 
2.0%
gallen5356
 
1.8%
Other values (4489)138167
46.7%

Most occurring characters

ValueCountFrequency (%)
e163507
 
8.9%
r156410
 
8.5%
n140270
 
7.6%
i111099
 
6.0%
B98836
 
5.4%
h95451
 
5.2%
87040
 
4.7%
a84530
 
4.6%
l82998
 
4.5%
t71412
 
3.9%
Other values (75)750380
40.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter1346108
73.1%
Uppercase Letter385323
 
20.9%
Space Separator87040
 
4.7%
Other Punctuation15562
 
0.8%
Dash Punctuation6284
 
0.3%
Open Punctuation743
 
< 0.1%
Close Punctuation743
 
< 0.1%
Decimal Number126
 
< 0.1%
Math Symbol4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e163507
12.1%
r156410
11.6%
n140270
10.4%
i111099
 
8.3%
h95451
 
7.1%
a84530
 
6.3%
l82998
 
6.2%
t71412
 
5.3%
c65734
 
4.9%
u63053
 
4.7%
Other values (27)311644
23.2%
Uppercase Letter
ValueCountFrequency (%)
B98836
25.7%
Z57774
15.0%
H43850
11.4%
S36263
 
9.4%
L21682
 
5.6%
G18169
 
4.7%
W14556
 
3.8%
O14321
 
3.7%
A13003
 
3.4%
R9551
 
2.5%
Other values (22)57318
14.9%
Decimal Number
ValueCountFrequency (%)
436
28.6%
320
15.9%
620
15.9%
720
15.9%
220
15.9%
110
 
7.9%
Other Punctuation
ValueCountFrequency (%)
.6818
43.8%
,4677
30.1%
/4045
26.0%
'20
 
0.1%
&2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
87040
100.0%
Dash Punctuation
ValueCountFrequency (%)
-6284
100.0%
Open Punctuation
ValueCountFrequency (%)
(743
100.0%
Close Punctuation
ValueCountFrequency (%)
)743
100.0%
Math Symbol
ValueCountFrequency (%)
+4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1731431
94.0%
Common110502
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e163507
 
9.4%
r156410
 
9.0%
n140270
 
8.1%
i111099
 
6.4%
B98836
 
5.7%
h95451
 
5.5%
a84530
 
4.9%
l82998
 
4.8%
t71412
 
4.1%
c65734
 
3.8%
Other values (59)661184
38.2%
Common
ValueCountFrequency (%)
87040
78.8%
.6818
 
6.2%
-6284
 
5.7%
,4677
 
4.2%
/4045
 
3.7%
(743
 
0.7%
)743
 
0.7%
436
 
< 0.1%
'20
 
< 0.1%
320
 
< 0.1%
Other values (6)76
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII1779248
96.6%
None62685
 
3.4%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e163507
 
9.2%
r156410
 
8.8%
n140270
 
7.9%
i111099
 
6.2%
B98836
 
5.6%
h95451
 
5.4%
87040
 
4.9%
a84530
 
4.8%
l82998
 
4.7%
t71412
 
4.0%
Other values (58)687695
38.7%
None
ValueCountFrequency (%)
ü55870
89.1%
ä3191
 
5.1%
ö1009
 
1.6%
è909
 
1.5%
é684
 
1.1%
â510
 
0.8%
Ü420
 
0.7%
Ä35
 
0.1%
Ö19
 
< 0.1%
ô9
 
< 0.1%
Other values (7)29
 
< 0.1%

Interactions

2022-11-18T16:32:34.370237image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:44.769170image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:12.555600image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:42.869138image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:11.248301image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:38.153968image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:22.444997image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:56.609454image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:32:43.573571image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:44.911322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:12.729278image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:43.016332image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:11.372931image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:39.914370image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:23.609979image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:05.811797image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:32:53.413513image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:45.066384image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:12.951722image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:43.162319image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:11.503606image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:42.512692image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:24.820725image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:15.188467image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:33:02.660028image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:45.194202image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:13.125460image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:43.296386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:11.631049image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:44.712718image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:26.826075image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:24.790019image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:33:11.649694image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:45.343983image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:13.454194image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:43.465427image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:11.784270image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:47.045957image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:28.058654image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:34.361147image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:33:24.106748image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:48.066539image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:17.210342image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:46.418876image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:14.691603image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:52.763219image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:32.167803image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:45.179412image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:33:35.470470image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:27:49.852832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:18.977091image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:48.227995image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:16.462962image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:56.935057image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:34.612363image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:31:54.962835image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:33:55.980517image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:01.405040image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:31.757726image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:28:59.624929image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:29:27.439395image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:10.048033image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:30:45.976253image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2022-11-18T16:32:15.164655image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2022-11-18T16:34:29.733357image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-18T16:34:29.854720image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-18T16:34:29.974833image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-18T16:34:30.110420image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-11-18T16:34:16.953878image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-18T16:34:18.113265image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-11-18T16:34:20.315162image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-11-18T16:34:21.294977image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexparticipant_idu_dateS_alterS_sexS_wohnsitzu_klassencodeu_gaS_AB3_HTAR_anschlussR_stoerungdevice_typedispcodeu_ticketu_fahrausweisu_preisR_zweckft_abfahrtft_ankunftft_startort_uicft_tuft_vmft_vm_kurzft_zielort_uicfg_abfahrtfg_ankunftfg_startort_uicfg_zielort_uicfg_startortfg_zielortft_startortft_zielort
02263355836192022-10-3055.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendetMobile-TicketNormales Billett13.5Freizeit und Unterhaltung2022-11-18 16:49:002022-11-18 17:34:008504300SBBIC-51-1629IC85001132022-11-18 16:49:002022-11-18 18:10:0085043008500117Biel/BienneAesch BLBiel/BienneLaufen
12264005837242022-10-3020.0männlichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNaNDesktopBeendetMobile-TicketNormales Billett16.5Freizeit und Unterhaltung2022-11-18 14:30:002022-11-18 15:24:008502113SBBIC-1-722IC85070002022-11-18 14:30:002022-11-18 15:24:0085021138507000AarauBernAarauBern
22261415833522022-10-3056.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNneinNaNNaNDesktopAusgescreentNaNGANaNFreizeit und Unterhaltung2022-11-18 15:54:002022-11-18 16:24:008507483SBBIC-6-1076IC85070002022-11-18 15:12:002022-11-18 16:33:0085073898516161ZweilütschinenBern WankdorfSpiezBern
32267485841942022-10-3026.0weiblichIn der Schweiz / LiechtensteinNaNNaNneinJaNeinDesktopBeendetMobile-TicketNormales Billett15.5Freizeit und Unterhaltung2022-11-18 17:08:002022-11-18 18:22:008509000SBBIC-3-580IC85030002022-11-18 17:08:002022-11-18 18:22:0085090008503000ChurZürich HBChurZürich HB
42264155837442022-10-3043.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendet nach UnterbrechungMobile-TicketNormales Billett29.5Freizeit und Unterhaltung2022-11-18 11:38:002022-11-18 12:05:008503000SBBRE--4818RE85021132022-11-18 10:02:002022-11-18 12:23:0085740398502180Buchs SG, WerdenbergMuhen NordZürich HBAarau
52264085837372022-10-3047.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendetMobile-TicketNormales Billett2.0Freizeit und Unterhaltung2022-11-18 07:26:002022-11-18 08:02:008507100SBBIC-8-804IC85016052022-11-18 07:26:002022-11-18 08:02:0085071008501605ThunVispThunVisp
62267545842012022-10-3028.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendetNaNSparbillett17.8Freizeit und Unterhaltung2022-11-18 12:04:002022-11-18 13:10:008507000SBBIC-61-968IC85000102022-11-18 12:04:002022-11-18 13:38:0085070008588764BernArlesheim, Im LeeBernBasel SBB
72264045837322022-10-3071.0weiblichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendetEasy RideNormales Billett19.6Freizeit und Unterhaltung2022-11-18 16:32:002022-11-18 16:51:008502280SBBIR-70-2631IR85050002022-11-18 15:42:002022-11-18 18:08:0085027478508100Oberägeri, LändliLangenthalZug, BahnhofplatzLuzern
82264035837272022-10-3032.0weiblichIn der Schweiz / LiechtensteinNaNNaNneinJaNaNDesktopBeendetMobile-TicketNormales Billett3.9Freizeit und Unterhaltung2022-11-18 00:14:002022-11-18 00:30:008500023SOBIR--2347IR85002182022-11-18 00:14:002022-11-18 00:30:0085000238500218LiestalOltenLiestalOlten
92267605842082022-10-3067.0männlichIn der Schweiz / Liechtenstein2. KlasseNaNjaJaNeinDesktopBeendetMobile-TicketNormales Billett10.1Freizeit und Unterhaltung2022-11-18 16:33:002022-11-18 16:55:008530237MGBR-43-550R85016722022-11-18 16:03:002022-11-18 17:05:0085016988578899Riederalp MitteErnen, AragonMörel (Riederalpbahn)Fiesch

Last rows

df_indexparticipant_idu_dateS_alterS_sexS_wohnsitzu_klassencodeu_gaS_AB3_HTAR_anschlussR_stoerungdevice_typedispcodeu_ticketu_fahrausweisu_preisR_zweckft_abfahrtft_ankunftft_startort_uicft_tuft_vmft_vm_kurzft_zielort_uicfg_abfahrtfg_ankunftfg_startort_uicfg_zielort_uicfg_startortfg_zielortft_startortft_zielort
226527880516152019-01-0226.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaJaNeinNaNNaNMobile-TicketNaN18.6Freizeit und Unterhaltung2022-11-18 18:47:002022-11-18 19:23:008503424SBBIC 4 281IC85030002022-11-18 18:47:002022-11-18 20:28:0085034248502853SchaffhausenUnterägeri, SpinnereiSchaffhausenZürich HB
226528881516162019-01-0244.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaNaNNeinNaNNaNMobile-TicketNaN4.5Freizeit und Unterhaltung2022-11-18 15:14:002022-11-18 15:31:008509197RhBIR 1145IR85091982022-11-18 15:14:002022-11-18 15:31:0085091978509198Bergün/BravuognPredaBergün/BravuognPreda
226529882516182019-01-0269.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaJaNeinNaNNaNMobile-TicketNaN10.8Freizeit und Unterhaltung2022-11-18 16:15:002022-11-18 16:30:008587020SBBS 11 19163S85030002022-11-18 16:00:002022-11-18 17:05:0085906188587984Geroldswil, ZentrumMeilen, SchwabachDietikon, BahnhofZürich HB
226530883516192019-01-0268.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaJaNeinNaNNaNMobile-TicketNaN17.0Freizeit und Unterhaltung2022-11-18 18:33:002022-11-18 19:35:008503000SBBIC 1 727IC85063022022-11-18 18:33:002022-11-18 19:54:0085030008506290Zürich HBHerisauZürich HBSt. Gallen
226531884516202019-01-0235.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaJaNeinNaNNaNMobile-TicketNaN15.0Sonstige2022-11-18 22:46:002022-11-18 22:55:008503016SBBIC 8 838IC85030002022-11-18 22:46:002022-11-18 23:49:0085030168505000Zürich FlughafenLuzernZürich FlughafenZürich HB
226532885516212019-01-0237.0weiblichIn der Schweiz / Liechtenstein2. Klassekein GAjaNaNNeinNaNNaNMobile-TicketNaN17.0Freizeit und Unterhaltung2022-11-18 09:50:002022-11-18 11:01:008509002RhBRE 1229RE85092652022-11-18 09:50:002022-11-18 11:01:0085090028509265LandquartGuardaLandquartGuarda
226533886516222019-01-0227.0weiblichIn der Schweiz / Liechtenstein2. Klassekein GAjaNeinJaNaNNaNMobile-TicketNaN36.6Sonstige2022-11-18 07:30:002022-11-18 08:06:008500218SBBIR 17 2359IR85030002022-11-18 07:30:002022-11-18 08:58:0085002188590646OltenHerrliberg, WetzwilOltenZürich HB
226534887516232019-01-0252.0weiblichIn der Schweiz / Liechtenstein2. Klassekein GAjaJaNeinNaNNaNMobile-TicketNaN24.0Freizeit und Unterhaltung2022-11-18 13:15:002022-11-18 14:43:008503006SBBIC 5 1522IC85043002022-11-18 13:15:002022-11-18 14:56:0085030068504419Zürich OerlikonBiel MettZürich OerlikonBiel/Bienne
226535888516242019-01-0239.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaNaNNeinNaNNaNMobile-TicketNaN21.5Freizeit und Unterhaltung2022-11-18 17:00:002022-11-18 17:40:008507000SBBIR 15 2527IR85020072022-11-18 16:22:002022-11-18 17:40:0085041028502007SchmittenSurseeBernSursee
226536913516612019-01-0238.0männlichIn der Schweiz / Liechtenstein2. Klassekein GAjaNaNJaNaNNaNMobile-TicketNaN16.8Freizeit und Unterhaltung2022-11-18 16:10:002022-11-18 16:55:008506304SOBS 4 11462S85062012022-11-18 16:10:002022-11-18 16:55:0085063048506201MörschwilLichtensteigMörschwilLichtensteig